AITopics | difficult problem

Collaborating Authors

difficult problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

e2a7555f7cabd6e31aef45cb8cda4999-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 20:02:53 GMT

control variate, theorem 1, tunamh, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

QueST: Incentivizing LLMs to Generate Difficult Problems

Hu, Hanxu, Zhang, Xingxing, Vamvas, Jannis, Sennrich, Rico, Wei, Furu

arXiv.org Artificial IntelligenceOct-21-2025

Large Language Models have achieved strong performance on reasoning tasks, solving competition-level coding and math problems. However, their scalability is limited by human-labeled datasets and the lack of large-scale, challenging coding problem training data. Existing competitive coding datasets contain only thousands to tens of thousands of problems. Previous synthetic data generation methods rely on either augmenting existing instruction datasets or selecting challenging problems from human-labeled data. In this paper, we propose QueST, a novel framework which combines difficulty-aware graph sampling and difficulty-aware rejection fine-tuning that directly optimizes specialized generators to create challenging coding problems. Our trained generators demonstrate superior capability compared to even GPT-4o at creating challenging problems that benefit downstream performance. We leverage QueST to generate large-scale synthetic coding problems, which we then use to distill from strong teacher models with long chain-of-thought or to conduct reinforcement learning for smaller models, proving effective in both scenarios. Our distillation experiments demonstrate significant performance gains. Specifically, after fine-tuning Qwen3-8B-base on 100K difficult problems generated by QueST, we surpass the performance of the original Qwen3-8B on LiveCodeBench. With an additional 112K examples (i.e., 28K human-written problems paired with multiple synthetic solutions), our 8B model matches the performance of the much larger DeepSeek-R1-671B. These findings indicate that generating complex problems via QueST offers an effective and scalable approach to advancing the frontiers of competitive coding and reasoning for large language models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.17715

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Efficient Prediction of Pass@k Scaling in Large Language Models

Kazdan, Joshua, Schaeffer, Rylan, Allouah, Youssef, Sullivan, Colin, Yu, Kyssen, Levi, Noam, Koyejo, Sanmi

arXiv.org Machine LearningOct-8-2025

Assessing the capabilities and risks of frontier AI systems is a critical area of research, and recent work has shown that repeated sampling from models can dramatically increase both. For instance, repeated sampling has been shown to increase their capabilities, such as solving difficult math and coding problems, but it has also been shown to increase their potential for harm, such as being jailbroken. Such results raise a crucial question for both capability and safety forecasting: how can one accurately predict a model's behavior when scaled to a massive number of attempts, given a vastly smaller sampling budget? This question is directly relevant to model providers, who serve hundreds of millions of users daily, and to governmental regulators, who seek to prevent harms. To answer this questions, we make three contributions. First, we find that standard methods for fitting these laws suffer from statistical shortcomings that hinder predictive accuracy, especially in data-limited scenarios. Second, we remedy these shortcomings by introducing a robust estimation framework, which uses a beta-binomial distribution to generate more accurate predictions from limited data. Third, we propose a dynamic sampling strategy that allocates a greater budget to harder problems. Combined, these innovations enable more reliable prediction of rare risks and capabilities at a fraction of the computational cost.

arxiv, estimator, prediction, (14 more...)

arXiv.org Machine Learning

2510.05197

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.67)

Industry: Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

ScaleDiff: Scaling Difficult Problems for Advanced Mathematical Reasoning

Pei, Qizhi, Pan, Zhuoshi, Lin, Honglin, Gao, Xin, Li, Yu, Tang, Zinan, He, Conghui, Yan, Rui, Wu, Lijun

arXiv.org Artificial IntelligenceSep-26-2025

Large Reasoning Models (LRMs) have shown impressive capabilities in complex problem-solving, often benefiting from training on difficult mathematical problems that stimulate intricate reasoning. Recent efforts have explored automated synthesis of mathematical problems by prompting proprietary models or large-scale open-source models from seed data or inherent mathematical concepts. However, scaling up these methods remains challenging due to their high computational/API cost, complexity of prompting, and limited difficulty level of the generated problems. To overcome these limitations, we propose ScaleDiff, a simple yet effective pipeline designed to scale the creation of difficult problems. We efficiently identify difficult problems from existing datasets with only a single forward pass using an adaptive thinking model, which can perceive problem difficulty and automatically switch between "Thinking" and "NoThinking" modes. We then train a specialized difficult problem generator (DiffGen-8B) on this filtered difficult data, which can produce new difficult problems in large scale, eliminating the need for complex, per-instance prompting and its associated high API costs. Fine-tuning Qwen2.5-Math-7B-Instruct on the ScaleDiff-Math dataset yields a substantial performance increase of 11.3% compared to the original dataset and achieves a 65.9% average accuracy on AIME'24, AIME'25, HMMT-Feb'25, BRUMO'25, and MATH500, outperforming recent strong LRMs like OpenThinker3. Notably, this performance is achieved using the cost-efficient Qwen3-8B model as a teacher, demonstrating that our pipeline can effectively transfer advanced reasoning capabilities without relying on larger, more expensive teacher models. Furthermore, we observe a clear scaling phenomenon in model performance on difficult benchmarks as the quantity of difficult problems increases. Code: https://github.com/QizhiPei/ScaleDiff.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.2107

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

R1: Comparison with inexact methods Aligning with prior exact papers [10, 18], we focus on comparisons with exact

Neural Information Processing SystemsAug-16-2025, 23:21:42 GMT

We thank all five reviewers for their detailed and incisive feedback. We tested AustereMH [16], an inexact method, on robust linear regression in Section 5.1 with We added this to the Appendix. This does not affect the properties of TunaMH. Our theorem doesn't have this assumption; it suggests that for MHSubLhd with given user-specified The impact is 3-fold: it (1) provides an upper bound on performance for algorithms of Algorithm 1's TunaMH); (3) suggests directions for developing new algorithms. To be significantly faster than TunaMH, we either need more assumptions about the problem or new stateful algorithms.

artificial intelligence, machine learning, tunamh, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

Add feedback

Long Is More Important Than Difficult for Training Reasoning Models

Shen, Si, Huang, Fei, Zhao, Zhixiao, Liu, Chang, Zheng, Tiansheng, Zhu, Danhao

arXiv.org Artificial IntelligenceMar-23-2025

Difficult problems, which often result in long reasoning traces, are widely recognized as key factors for enhancing the performance of reasoning models. However, such high-challenge problems are scarce, limiting the size of available datasets. In this paper, we propose a simple method to decouple the reliance on problem difficulty. First, we empirically demonstrate that reasoning length, rather than problem difficulty, primarily influences the performance of trained models. Second, we identify a scaling law on reasoning length, showing that model performance increases in a log-linear fashion as the reasoning data length grows. Finally, we introduce a straightforward technique to generate reasoning data of arbitrary length, and show that synthesized data is effective for training reasoning models. After fine-tuning the Qwen2.5-32B-Instruct language model on our Long1K dataset, we present our model, Long1K-32B, which achieves remarkable performance with only 1,000 training samples, achieving 95.6\% accuracy on MATH, and 71.1\% on GPQA outperforming DeepSeek-R1-Distill-Qwen-32B. The model, code, and dataset are all open-sourced, available at https://huggingface.co/ZTss/LONG1.

large language model, machine learning, reasoning length, (18 more...)

arXiv.org Artificial Intelligence

2503.18069

Country:

Asia > China > Jiangsu Province > Nanjing (0.05)
North America > United States (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback

Reviews: Generative Modeling by Estimating Gradients of the Data Distribution

Neural Information Processing SystemsJan-22-2025, 15:56:33 GMT

The paper proposes to perform Langevin dynamics in data space (as opposed to the latent space) of a deep generative model as a means to explore the data distribution. This reduces the difficult problem of estimating the data distribution to the slightly less difficult problem of estimating its gradients. The latter ones are estimated by different versions of score matching. This paper mainly builds on recent work on score matching by random projections. As a result, a new generative model is achieved whose sample quality is similar to GANs, while avoiding an adversarial training paradigm. This is a strong contribution.

data distribution, generative modeling, gradient, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.32)

Add feedback

CppFlow: Generative Inverse Kinematics for Efficient and Robust Cartesian Path Planning

Morgan, Jeremy, Millard, David, Sukhatme, Gaurav S.

arXiv.org Artificial IntelligenceSep-16-2023

In this work we present CppFlow - a novel and performant planner for the Cartesian Path Planning problem, which finds valid trajectories up to 129x faster than current methods, while also succeeding on more difficult problems where others fail. At the core of the proposed algorithm is the use of a learned, generative Inverse Kinematics solver, which is able to efficiently produce promising entire candidate solution trajectories on the GPU. Precise, valid solutions are then found through classical approaches such as differentiable programming, global search, and optimization. In combining approaches from these two paradigms we get the best of both worlds - efficient approximate solutions from generative AI which are made exact using the guarantees of traditional planning and optimization. We evaluate our system against other state of the art methods on a set of established baselines as well as new ones introduced in this work and find that our method significantly outperforms others in terms of the time to find a valid solution and planning success rate, and performs comparably in terms of trajectory length over time. The work is made open source and available for use upon acceptance.

fetch, trajectory, valid solution, (15 more...)

arXiv.org Artificial Intelligence

2309.09102

Country: North America > United States > California (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Accelerating laboratory automation through robot skill learning

AIHubOct-26-2022, 08:48:37 GMT

Transforming materials discovery plays a pivotal role in addressing global challenges. The applications of new materials could range from clean energy storage, to sustainable polymers and packaging for consumer products towards a more circular economy, to drugs and therapeutics. Stemming from the COVID-19 pandemic, where scientists had to halt experiments due to stringent social distancing measures or accelerate their efforts towards quickly producing a vaccine, there has recently been an increased interest in using robotics and automation in laboratory environments. The challenge here is that laboratories have been designed by and for humans and thus the available glassware, tools and equipment pose difficult problems for traditional automation methods that are inherently open loop and not adaptable. Learning-based methods that rely on autonomous trial and error are increasingly being used to achieve robotic tasks that could not be previously addressed with automation.

accelerating laboratory automation, automation, scientist, (15 more...)

AIHub

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.91)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.56)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

A gentle Introduction to Bayesian Inference

#artificialintelligenceOct-17-2022, 07:51:01 GMT

In this article, we have seen the Bayesian approach in action with the help of a small example. It uses prior knowledge and updates it with observed data to create a posterior, exactly like humans intuitively do. This approach is better than discarding the data and just proceeding with some prior, obviously. It is even more powerful than the maximum likelihood method: you can see this by choosing a flat prior, i.e. the prior gives the same probability (or density) to every possible value θ and is essentially a constant. Furthermore, the Bayes method even gives you a distribution of the parameters, while the maximum likelihood method does not.

bayesian inference, gentle introduction, maximum likelihood method, (3 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback